When Templeton starts, it first loads in the default configuration files. These include:
Enter starting URL: | This allows you to specify where Templeton should begin.
The URL is in one of the forms:
For example, you can enter: http://www.intel.com or http://c.gp.cs.cmu.edu:5103/prog/webster or http://info.webcrawler.com/mak/projects/robots/robots.html |
Enter local path ["none" for log files only]: | This command asks where retrieved files should be placed. You may either enter a path (i.e. D:\FILES\ or /tmp/retrieve) or the word "none". "None" informs Templeton not to retrieve files. If you operate your own web server, you may specify the root directory for that web server. |
Host restriction [yes|no|host|.domain]: | This is the first restriction option. Templeton has the ability to
retrieve from many machines, a few machines, or only one machine.
|
Should the host's subtree be restricted [yes|no|/path]: | When restricting to a specific host, you may also specify a restrictive subtree on the host. Templeton will not follow links beyond the specified subtree. Entering "yes" will restrict searches to the subtree specified in the initial URL. For example, http://c.gp.cs.cmu.edu:5103/prog/webster has the initial path "/prog". HTML documents not in the /prog directory would not be retrieved. Entering "no" places no restriction on the path, allowing Templeton to wander over the entire web site. Alternately, you may specify a path. This is useful when the starting URL is not the top of the directory tree. (Frequently, a web page may not be reachable from a page "above" it. This "lower" page may still be the "root" of the virtual subtree.) |
Enter maximum depth [0 for unlimited]: | This allows you to specify the number of links to follow. '1' will only return the web page specified by the initial URL. '2' will retrieve the initial URL and all links from that page (restrictions permitting). The larger the number, the more levels of indirect links that will be retrieved. Entering '0' will not restrict the number of links. If you are unsure of the number or links you will require, you should enter a finite number, such as '3', '5', or '10'. |
An example response is:
Enter starting URL: http://www.cs.tamu.edu/people/ Enter local path ["none" for log files only]: /temp Host restriction [yes|no|host|.domain]: yes Should the host's subtree be restricted [yes|no|/path]: /people Enter maximum depth [0 for unlimited]: 3 |
Password required for realm = "Secret_Project" Enter user name: myusername Enter password: |
If you incorrectly enter your user name or password, Templeton will prompt you to enter them again. If you do not know a valid user name or password, then enter a hyphen "-" for both fields. This will skip the protected URL.
A note about security: your username and password are not secure. Basic authentication uses a simple encoding scheme -- so simple that many people can actually read the encryted text without a computer! Anyone with a computer between you and the WWW server can view your user name and password and use it. There is not inherent security.
/temp/mapindex.html | An HTML document showing file links on the remote site. |
/temp/locindex.html | An HTML document showing file links in the local save-path. |
/temp/host.domain/ | Directory of files retrieved from the machine host.domain |
Current Depth: 2 (3 max) Links at current depth: 7 Total links remaining: 137 Current URL: http://www.cs.tamu.edu/people/ Local file: /temp/www.cs.tamu.edu/people/index.html IMAGE: Images/logos/csimage_basic.gif LINK: Images/index.html LINK: people/index.html ... |
The status also shows the current URL being processed and the name of the local file (when the URL is being mirrored). Under the local file are the type and name of all links that are found.
When all links are processed, the program will end. You may also break out of the program at any time.
? H or h | List the available commands |
a or A | Add a URL to be processed. You will be prompted for the URL to add. |
i or I | Interrupt the current file downloading. When pressed while "reading" a file from a server, the reading will stop and regular processing will continue. When pressed during the processing of a file, the processing is stopped and the next file is retrieved. This can be very useful when Templeton tries to retrieve an undesirable file that is extremely large (or time-consuming). |
l or L |
List Restrictions.
Templeton supports robot exclusion. Typing 'L' shows all known
exclusion rules. There are 3 types of rules:
|
s or S | Change the sleep interval. |
v or V | View the list of URLs to process. These are listed in the order that they will be processed, from top to botton. This list includes images, map files, and documents. |
q or Q | Quit Templeton |
x or X | Exit Templeton. Currently, there is no difference between quitting and exiting. |
any other key | Any other key will pause the system. It is not considered "nice" to pause the system while it is reading from the remote server since you will be pausing a "live" network connection and taking valuable time from the remote WWW server. "Live" connections that are paused for extended durations will be closed by the remote server. |